Discriminant analysis with singular covariance matrices. A method incorporating cross-validation and efficient randomized permutation tests

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).

Download Full-text

Analisis Metode Pengenalan Wajah Two Dimensial Principal Component Analysis (2DPCA) dan Kernel Fisher Discriminant Analysis Menggunakan Klasifikasi KNN (K- Nearest Neighbor)

Jurnal Teknologi dan Rekayasa Manufaktur ◽

10.48182/jtrm.v2i2.30 ◽

2020 ◽

Vol 2 (2) ◽

pp. 29-38

Author(s):

Abdur Rohman Harits Martawireja ◽

Hilman Mujahid Purnama ◽

Atika Nur Rahmawati

Keyword(s):

Principal Component Analysis ◽

Discriminant Analysis ◽

Cross Validation ◽

Nearest Neighbor ◽

Principal Component ◽

Component Analysis ◽

K Nearest Neighbor ◽

Fisher Discriminant Analysis ◽

Fisher Discriminant ◽

Kernel Fisher Discriminant Analysis

Pengenalan wajah manusia (face recognition) merupakan salah satu bidang penelitian yang penting dan belakangan ini banyak aplikasi yang menerapkannya, baik di bidang komersil ataupun di bidang penegakan hukum. Pengenalan wajah merupakan sebuah sistem yang berfungsikan untuk mengidentifikasi berdasarkan ciri-ciri dari wajah seseorang berbasis biometrik yang memiliki keakuratan tinggi. Pengenalan wajah dapat diterapkan pada sistem keamanan. Banyak metode yang dapat digunakan dalam aplikasi pengenalan wajah untuk keamanan sistem, namun pada artikel ini akan membahas tentang dua metode yaitu Two Dimensial Principal Component Analysis dan Kernel Fisher Discriminant Analysis dengan metode klasifikasi menggunakan K-Nearest Neigbor. Kedua metode ini diuji menggunakan metode cross validation. Hasil dari penelitian terdahulu terbukti bahwa sistem pengenalan wajah metode Two Dimensial Principal Component Analysis dengan 5-folds cross validation menghasilkan akurasi sebesar 88,73%, sedangkan dengan 2-folds validation akurasi yang dihasilkan sebesar 89,25%. Dan pengujian metode Kernel Fisher Discriminant dengan 2-folds cross validation menghasilkan akurasi rata rata sebesar 83,10%.

Download Full-text

A Discriminant Analysis of Committed and Voluntary Psychiatric Patients

The Journal of Psychiatry & Law ◽

10.1177/0093185386014001-207 ◽

1986 ◽

Vol 14 (1-2) ◽

pp. 159-176 ◽

Cited By ~ 5

Author(s):

Robert A. Nicholson ◽

Joseph M. Horn

Keyword(s):

Discriminant Analysis ◽

Length Of Stay ◽

Marital Status ◽

Employment Status ◽

Cross Validation ◽

Psychiatric Patients ◽

Maximum Benefit ◽

Education And Employment ◽

Double Cross

Eleven background, diagnostic, and hospitalization characteristics were used to discriminate committed and voluntary psychiatric patients in a double cross-validation design. Diagnosis was more important than individual social and status resources (race, marital status, education, and employment status) in discriminating the two groups of patients. Further, characteristics of hospitalization (length of stay, percentage of patients receiving maximum benefit from treatment, and frequency of discharge referrals) did not contribute significantly to discrimination of the two groups, suggesting that committed and voluntary patients did not differ with regard to the adequacy or effectiveness of treatment in the hospital.

Download Full-text

Linear shrinkage estimation of covariance matrices using low-complexity cross-validation

Signal Processing ◽

10.1016/j.sigpro.2018.02.026 ◽

2018 ◽

Vol 148 ◽

pp. 223-233 ◽

Cited By ~ 3

Author(s):

Jun Tong ◽

Rui Hu ◽

Jiangtao Xi ◽

Zhitao Xiao ◽

Qinghua Guo ◽

...

Keyword(s):

Cross Validation ◽

Low Complexity ◽

Linear Shrinkage ◽

Covariance Matrices ◽

Shrinkage Estimation

Download Full-text

Intravoxel Incoherent Motion Diffusion for Identification of Breast Malignant and Benign Tumors Using Chemometrics

BioMed Research International ◽

10.1155/2017/3845409 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Fengnong Chen ◽

Pulan Chen ◽

Hamed Hamid Muhammed ◽

Juan Zhang

Keyword(s):

Support Vector Machine ◽

Diffusion Coefficient ◽

Discriminant Analysis ◽

Cross Validation ◽

Binary Classification ◽

Intravoxel Incoherent Motion ◽

Support Vector ◽

Benign Tumors ◽

Accurate Identification ◽

Malignant And Benign Tumors

The aim of the paper is to identify the breast malignant and benign lesions using the features of apparent diffusion coefficient (ADC), perfusion fraction f, pseudodiffusion coefficient D⁎, and true diffusion coefficient D from intravoxel incoherent motion (IVIM). There are 69 malignant cases (including 9 early malignant cases) and 35 benign breast cases who underwent diffusion-weighted MRI at 3.0 T with 8 b-values (0~1000 s/mm2). ADC and IVIM parameters were determined in lesions. The early malignant cases are used as advanced malignant and benign tumors, respectively, so as to assess the effectiveness on the result. A predictive model was constructed using Support Vector Machine Binary Classification (SVMBC, also known Support Vector Machine Discriminant Analysis (SVMDA)) and Partial Least Squares Discriminant Analysis (PLSDA) and compared the difference between them both. The D value and ADC provide accurate identification of malignant lesions with b=300, if early malignant tumor was considered as advanced malignant (cancer). The classification accuracy is 93.5% for cross-validation using SVMBC with ADC and tissue diffusivity only. The sensitivity and specificity are 100% and 87.0%, respectively, r2cv=0.8163, and root mean square error of cross-validation (RMSECV) is 0.043. ADC and IVIM provide quantitative measurement of tissue diffusivity for cellularity and are helpful with the method of SVMBC, getting comprehensive and complementary information for differentiation between benign and malignant breast lesions.

Download Full-text

On d-Asymptotics for High-Dimensional Discriminant Analysis with Different Variance-Covariance Matrices

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e95.d.3106 ◽

2012 ◽

Vol E95.D (12) ◽

pp. 3106-3108

Author(s):

Takanori AYANO ◽

Joe SUZUKI

Keyword(s):

Discriminant Analysis ◽

Covariance Matrices ◽

High Dimensional

Download Full-text

Multiple Subject Barycentric Discriminant Analysis (MUSUBADA): How to Assign Scans to Categories without Using Spatial Normalization

Computational and Mathematical Methods in Medicine ◽

10.1155/2012/634165 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 9

Author(s):

Hervé Abdi ◽

Lynne J. Williams ◽

Andrew C. Connolly ◽

M. Ida Gobbini ◽

Joseph P. Dunlop ◽

...

Keyword(s):

Discriminant Analysis ◽

Confidence Intervals ◽

Cross Validation ◽

Regions Of Interest ◽

Fmri Data ◽

Spatial Normalization ◽

Data Table ◽

Statistical Inferences ◽

Random Models ◽

Confusion Matrices

We present a new discriminant analysis (DA) method called Multiple Subject Barycentric Discriminant Analysis (MUSUBADA) suited for analyzing fMRI data because it handles datasets with multiple participants that each provides different number of variables (i.e., voxels) that are themselves grouped into regions of interest (ROIs). Like DA, MUSUBADA (1) assigns observations to predefined categories, (2) gives factorial maps displaying observations and categories, and (3) optimally assigns observations to categories. MUSUBADA handles cases with more variables than observations and can project portions of the data table (e.g., subtables, which can represent participants or ROIs) on the factorial maps. Therefore MUSUBADA can analyze datasets with different voxel numbers per participant and, so does not require spatial normalization. MUSUBADA statistical inferences are implemented with cross-validation techniques (e.g., jackknife and bootstrap), its performance is evaluated with confusion matrices (for fixed and random models) and represented with prediction, tolerance, and confidence intervals. We present an example where we predict the image categories (houses, shoes, chairs, and human, monkey, dog, faces,) of images watched by participants whose brains were scanned. This example corresponds to a DA question in which the data table is made of subtables (one per subject) and with more variables than observations.

Download Full-text

A COMPARISON OF THE PREDICTIVE POWERS OF TENURE CHOICES BETWEEN PROPERTY OWNERSHIP AND RENTING

International Journal of Strategic Property Management ◽

10.3846/ijspm.2019.7064 ◽

2019 ◽

Vol 23 (2) ◽

pp. 130-141 ◽

Cited By ~ 4

Author(s):

Chun-Chang Lee ◽

Chih-Min Liang ◽

Yang-Tung Liu

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Performance Prediction ◽

Predictive Power ◽

Cross Validation ◽

Binary Logistic Regression ◽

Analysis Model ◽

Linear Modeling ◽

Binary Logistic Regression Model ◽

Hit Rate

This paper compares the predictive powers of hierarchical generalized linear modeling (HGLM), logistic regression, and discriminant analysis with regard to tenure choices between buying property and renting property by sampling the residents of the Greater Taipei area. The results imply that the hit rate and other indicators included in HGLM have better predictive power with regard to tenure choices than the binary logistic regression model and the discriminant analysis model. That is, using HGLM to process nested data can increase prediction accuracy regarding household tenure choices. Furthermore, cross-validation is performed to analyze hit rate stability. The hit rate sequencing from this cross-validation is found to be consistent with the HGLM results, implying that the comparison of the three models in terms of hit rate performance prediction in this study is stable and reliable.

Download Full-text

Validation and analysis of the geographical origin of Angelica sinensis (Oliv.) Diels using multi-element and stable isotopes

PeerJ ◽

10.7717/peerj.11928 ◽

2021 ◽

Vol 9 ◽

pp. e11928

Author(s):

Shanjia Li ◽

Hui Wang ◽

Ling Jin ◽

James F. White ◽

Kathryn L. Kingsley ◽

...

Keyword(s):

Mass Spectrometry ◽

Stable Isotopes ◽

Discriminant Analysis ◽

Industrial Development ◽

Cross Validation ◽

Geographical Origin ◽

Mineral Elements ◽

Partial Least Square ◽

Angelica Sinensis ◽

Linear Discriminant

Background Place of origin is an important factor when determining the quality and authenticity of Angelica sinensis for medicinal use. It is important to trace the origin and confirm the regional characteristics of medicinal products for sustainable industrial development. Effectively tracing and confirming the material’s origin may be accomplished by detecting stable isotopes and mineral elements. Methods We studied 25 A. sinensis samples collected from three main producing areas (Linxia, Gannan, and Dingxi) in southeastern Gansu Province, China, to better identify its origin. We used inductively coupled plasma mass spectrometry (ICP-MS) and stable isotope ratio mass spectrometry (IRMS) to determine eight mineral elements (K, Mg, Ca, Zn, Cu, Mn, Cr, Al) and three stable isotopes (δ13C, δ15N, δ18O). Principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and linear discriminant analysis (LDA) were used to verify the validity of its geographical origin. Results K, Ca/Al, δ13C, δ15N and δ18O are important elements to distinguish A. sinensis sampled from Linxia, Gannan and Dingxi. We used an unsupervised PCA model to determine the dimensionality reduction of mineral elements and stable isotopes, which could distinguish the A. sinensis from Linxia. However, it could not easily distinguish A. sinensis sampled from Gannan and Dingxi. The supervised PLS-DA and LDA models could effectively distinguish samples taken from all three regions and perform cross-validation. The cross-validation accuracy of PLS-DA using mineral elements and stable isotopes was 84%, which was higher than LDA using mineral elements and stable isotopes. Conclusions The PLS-DA and LDA models provide a theoretical basis for tracing the origin of A. sinensis in three regions (Linxia, Gannan and Dingxi). This is significant for protecting consumers’ health, rights and interests.

Download Full-text

Rapid detection of SARS-CoV-2 infection by multicapillary column coupled ion mobility spectrometry (MCC-IMS) of breath. A proof of concept study

10.1101/2020.06.30.20143347 ◽

2020 ◽

Cited By ~ 2

Author(s):

Claus Steppert ◽

Isabel Steppert ◽

Gunther Becher ◽

William Sterlacci ◽

Thomas Bollinger

Keyword(s):

Discriminant Analysis ◽

Ion Mobility ◽

Influenza A ◽

Cross Validation ◽

Viral Disease ◽

Pcr Analysis ◽

Proof Of Concept ◽

Influenza Epidemic ◽

Rt Pcr ◽

Non Invasive

AbstractThere is an urgent need for screening patients of having a communicable viral disease to cut infection chains.We could recently demonstrate that MCC-IMS of breath is able to identify Influenza-A infected patients. With decreasing Influenza epidemic and upcoming SARS-CoV-2 infections we extended our study to the analysis of patients with suspected SARS-CoV-2 infections.51 patients, 23m, 28f, aged 64 ± 16 years, were included in this study.Besides RT-PCR analysis of nasopharyngeal swabs all patients underwent MCC-IMS analysis of breath. 16 patients, 7m, 9f, were positive for SARS-CoV-2 by RT-PCR. There was no difference in gender or age according to the groups.Stepwise canonical discriminant analysis could correctly classify the infected and non-infected subjects in 98% by cross-validation. Afterwards we combined the Influenza-A sub study and the SARS-CoV-2-sub study for a total of 75 patients, 34m, 41f, aged 64.8 ± 1.8 years, 14 positive for Influenza-A, 16 positive for SARS-CoV-2, the remaining 44 patients were used as controls. In one patient RT-PCR was highly suspicious of SARS-CoV-2 but inconclusive.There was no imbalance between the groups for age or gender.97.3% of the patients could be correctly classified to the respective group by discriminant analysis. Even the inconclusive patient could be mapped to the SARS-CoV-2 group applying the discrimination function.ConclusionMCC-IMS is able to detect SARS-CoV-2 infection and Influenza-A infection in breath. As this method provides exact, fast non-invasive diagnosis it should be further developed for screening of communicable viral diseases.Study registration: NCT04282135

Download Full-text